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Abstract 

We present a formalism of the transition matrix Monte Carlo method. 
A stochastic matrix in the space of energy can be estimated from Monte 
Carlo simulation. This matrix is used to compute the density of states, as 
well as to construct multi-canonical and equal-hit algorithms. We discuss 
the performance of the methods. The results are compared with single 
histogram method, multi-canonical method, and other methods. In many 
aspects, the present method is an improvement over the previous methods. 
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1 Introduction 

The Monte Carlo technique has served us well in the study of equilibrium 
statistical mechanics and other fields. The traditional local Monte Carlo method 
is simple, extremely general, and versatile. However, there are some intrinsic 
drawbacks. First, the convergence of the results to the exact values is slow. The 
basic probabilistic nature has limited the Monte Carlo error to decrease as 1/ Vi, 
where t is Monte Carlo steps or computer time. With exception of quasi Monte 
Carlo 1^ for numerical integration, as long as we use a probabilistic approach, it 
does not appear possible to overcome this barrier. Most of the work to improve 
the efficiency of Monte Carlo method has been via variance reduction 
which reduces the value of coefficient in front of the law. Next, while the 

traditional Monte Carlo method is good for computing expectation values such 
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as the internal energy and its derivatives, it is more difficult to compute the free 
energy or entropy Q . 

Over the last few decades, a number of methods have been developed to 
compute the density of states. The histogram method |^ and the multiple 
histogram method ||^ can be regarded from this point of view. The multi- 
canonical method |^ in some sense is also a computation of the density of states. 
Both of these methods involve the re- weighting of probabilities to construct the 
canonical distribution. Oliveira et ai proposed a broad histogram method, 
in which the density of states is also computed from simulation. 

If the density of states can be computed with sufficient accuracy, then most 
thermodynamic quantities can be obtained with little further effort. This in- 
cludes the moments of energy, entropy, and free energy. Moreover, the results 
are obtained as a continuous function of temperature from a single simulation. 
In this paper, we present such a method and study its efficiency. This method 
includes the use of a transition matrix , a stochastic matrix defined in the 
space of energy, and a class of related simulation algorithms |2[ The 
present method has the elements of both the broad histogram method and 
multi-canonical method. The flat-histogram algorithm offers an effective way 
to compute density of states n{E) for all energy E. With its multi-canonical 
element, it also offers fast dynamics for systems at first-order phase transitions. 
The use of transition matrix improves the efficiency of data analysis. 

In the next section, we shall discuss the formalism and the essential aspects 
of the method. We also present the results of some numerical tests and discuss 
the connections of our method with previous methods. In the Appendices, we 
give details of a transition matrix dynamics. 

2 Formalism 

2.1 Markov Chain Monte Carlo 

The Monte Carlo method aims at generating samples a with probability dis- 
tribution P{<j), where cr is a particular state of the system. In the Ising 
model, which we shall use as a concrete example, cr is a vector of all the spins 
{di, (72, • • • , ctat}, where Ui — ±1. In the usual application of the Monte Carlo 
method, the invariant distribution P{a) is given by the canonical distribution 
(Gibbs distribution) exp(^~'E{a)/ kgTy However, this need not be the case. In 
the equal-hit ensemble that we shall discuss later, P{a) is not known, and is 
not even unique. Nevertheless, it is still a valid Monte Carlo algorithm that can 
have significant advantages. 

A sequence of states or samples is generated by a Markov chain jlj] with 
transitions between states described by a matrix W{a — s- a'). This is the condi- 
tional probability that state moves to a' given that the current state is a. This 
matrix is known as a stochastic matrix and it must satisfy 

^iy(cr cr') = 1, Wia ^ a')>0. (1) 
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There is considerable freedom in choosing the matrix W, but the most important 
condition (or criterion) is detailed balance 

P{a)W{a ^ a) = P{(7')W{(7' ^ a). (2) 

A Markov chain that satisfies the above condition is called a reversible Markov 
chain. This condition guarantees the invariance of the probability P{cr) with 
respect to the transition matrix W, i.e., 

^P(a)W^(a-.a') = P(a'). (3) 

cr 

Repeated application of W to an arbitrary probability distribution makes the 
resulting probability distribution converge to a fixed point, P{cr). We shall not 
elaborate on the condition that an invariant probability distribution exists and 
is unique. Roughly speaking, we must be able to make transitions in a finite 
numbers of steps from any initial state to any final state. This is known as 
ergodicity. 

The standard Metropolis algorithm is to take 

W{a ^ a') = S{a^ a') min(^l,^y a ^ a' , (4) 

where S{a a') is a selection function - a conditional probability of attempting 
to go to state a' given that the current state is a. Within the above formulation, 
it is required that the S matrix is symmetric, 

S{a a') = S{(j' -> a), (5) 

although this condition can also be relaxed . Following Oliveira , we call 
this condition microscopic reversibility. The diagonal elements of W are fixed 
by the normalization condition, Eq. (^. Note that the diagonal elements are 
not needed explicitly in a computer simulation. 

In a computer implementation, a move is selected according to S (e.g., pick 
a site to flip a spin). The move is made if a random number ^ between and 
1 is less than the flip rate min(l, P((t')/P((t)); otherwise, it is rejected, and 
the original configuration a is counted once more as the next configuration in a 
Monte Carlo move. 

Clearly the above formalism is very general. Although the procedure can be 
used to sample any distribution, it has its limitations. One drawback of standard 
algorithm is that the configurations generated are correlated. These correlations 
severely limit the efficiency of the method near phase transitions or for models 
with competing interactions and many local minima. A number of methods 
have been proposed to address this problem, such as the cluster algorithms JlSf , 
the multi-canonical methods |||, |l9, 20 , rephca Monte Carlo Q, and simulated 



tempering [ p2[ . The flat histogram method presented in this paper is similar 
in some aspects to the multi-canonical method. The implementations of fiat 
histogram method and transition matrix based methods are very simple and 
efficient. 
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2.2 Histogram 



The concept of an energy histogram is essential to all of these methods. Other 
types of histogram of macroscopic quantities can be easily defined in analogy 
to the energy histogram and may be useful in some contexts, such as the joint 
histogram of energy and total magnetization. We define the energy histogram 
(in the case of a discrete energy spectrum) as the number of instances of each 
value of the energy E generated during a Monte Carlo simulation; we denote 
the histogram by H{E). 

The histogram is important because of its direct relationship to the probabil- 
ity distribution of the energy in the system being simulated. If the probability 
of a state a is given by P{(t) for a given simulation, then 

h{E)^Y.^Eia),EP{cT)^ J2 n^) (6) 

cr E(<y)=E 

is the probability that the system has an energy E. If the Monte Carlo simula- 
tion in question generates m configurations, then the expectation value of the 
histogram (average of the histogram over an infinite number of similar Monte 
Carlo runs) is given by 

{H{E))^mh{E). (7) 
For the canonical distribution, we have 

P[a) = f(E{a))/Z^Z-^e^v(-E{a)/kBT^. (8) 

where Z is the partition function, 

then h{E) ~ n{E)f{E)/Z. We define the density of states (for systems with 
discrete energy spectrum) 

niE)= J2 1 (10) 

E{<j)=E 

as the number of states with energy E. 

Note that the configuration dependence of the probability is only through 
energy implicitly. Thus, two configurations with the same energy will have the 
same probability. We shall call this the microcanonical property. The transition 
matrix Monte Carlo to be discussed below relies on this property crucially, while 
allowing the function f{E) to be arbitrary. 

The histogram H{E) sampled during a Monte Carlo run (the number of 
visits to energy E) is an estimator to h{E), i.e., H{E) oc h{E). The usual 
canonical Monte Carlo method is equivalent to using the number of visits H{E) 
to compute the moments of E at the simulation temperature Tq- The histogram 
method of Ferrenberg and Swendsen ||] is based on the simple observation that 
density of states can be estimated (up to a proportionality constant) by n{E) cx 
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H{E)/ f{E). With this information, the moments of E can be extrapolated for 
nearby temperatures as well. 

Clearly, if we can determine n{E), then most of the energy related thermo- 
dynamic averages can be determined, such as internal energy, specific heat, free 
energy, and entropy. The free energy is given then by 

F = -ksTlnZ = -kBT\TL^n{E)e^Y>{~E/kBT). (11) 

E 

Other quantity of interest can also be computed if a "histogram" (as a function 
of E) of such quantity is also collected, 

Y,{Q)e n{E) eM-E/keT) ^(Q)i; H{E) 

(Q>T = -^^^ • (12) 

Y^n{E)eM~E/kBT) E^(^) 

E E 

The main objective of this paper is to show that we can determine the density of 
states n{E) for the whole range of energy with Monte Carlo sampling efficiently. 



2.3 Detailed balance for histogram 

The transition matrix defined below serves a dual purpose — for the computation 
of the density of states and for the construction of flat histogram algorithms. 
There are a number of ways to look at the transition matrix based methods. We 
shall take the detailed balance equation, (||), as a basic starting point. Consider 
all initial states a with energy E and all final states a' with energy E'. Each 
pair of states {a, a'} has a detailed balance equation. Some of the equations 
may be the identity = if the transition by a single-spin flip is not possible. 
Summing up the detailed balance equations for all the states a with a fixed 
energy E and all the states a' with a fixed energy E', we have 

J2 E Pi<^)Wia^a')= J2 E Pi<^')Wia'^a). (13) 

E(a) = E E{a') = E' E{cr)=E E(cr') = E' 

Assuming that the configuration probability distribution is a function of energy 
only, i.e., -P(cr) cx f(^E{a)'j, and defining the transition matrix in energy as 

^(^-^^') = ;^ ^ ^ t^(a-a'), (14) 

^ ' E{a)=E E{(j')=E' 

we have 

n{E)f{E) T{E E') = n{E')f{E') T(E' E). (15) 

As a consequence of W being a stochastic matrix, T{E E') is also a stochastic 
matrix: 

^T{E ^ E') = 1, T{E ^ E')>0, (16) 

E' 
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with the histogram h{E) = n{E)f{E)/Z being the invariant distribution: 



J2h{E)T{E^E')^hiE'). (17) 



Similar definition to Eq. (14) was introduced in ref. [ESl in a different context. 



2.4 Broad histogram equation 

Because the matrix T{E E') is composed of two factors, only the second of 
which depends on the specific ensemble under consideration, it is convenient to 
refer all calculations to the "infinite-temperature" transition matrix, 

T^[E^E')^^{N{a,E' ~E))e. (18) 

where N is the number of spins, or more generally, the number of allowed moves 
from a given state. If we define AE — E' — E, we then have 

{N{a,AE))^ ^ 1 ^ N{a,AE) ^ ^ ^ S{a ^ a') 

N n(E) ^ N 2^ 2^ n(E) ' ^ ^ 

^ ' E(a)=E E(a)=E E((j')=E' ■ ' 

In a random single-spin-flip dynamics, 5(ct — *■ cr') equals 1/A^ if the two con- 
figurations G and a' differ by one spin, and zero otherwise. Thus, the second 
summation over a' gives the number N{a^E' — E) of configurations of energy 
E' that can be reached from cr of energy by a spin flip. The first summa- 
tion is over the configurations with energy E^ i.e., a microcanonical average of 
the quantity N{a^lS.E). The constancy of 5(ct — > a') for the nonzero matrix 
elements is important for this interpretation of {N{a,AE))E- The quantity 
N{a, AE) is central to the current method, as well as to the broad histogram 
method [||. 

Within the single-spin- flip dynamics, the matrix T is then given by 

TiE^E')^To,{E^E')a{E^E'), (20) 

where any flip rate a{E E') can be inserted once Too{E E') has been 
determined. Substituting Eq. ( pO| ) into the energy detailed balance equation 
(^Sj), we can cancel f{E) and a{E E') as for a valid dynamics which generates 
distribution P(cr) = /(i?(a-)) /Z, we have the usual detailed balance, 

<E ^ E') ^ f{E2 , . 

a{E'-.E) f[E)- ^ ' 

The final equation is known as broad histogram equation, initially presented by 
Oliveira et al 

n{E){Nia,E' -E))E^n{E'){N{a\E~E'))E'. (22) 
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In terms of the transition matrix notation, this becomes 



n{E)TooiE ^ E') = n{E')T^{E' ^ E). (23) 

The name "broad histogram" equation is historical and clearly a misnomer. The 
above equation has a very simple interpretation. Consider all pairs of states a 
with energy E and states a' with energy E' such that the moves (or transitions) 
between a and a' are allowed. These states correspond to states for which the 
matrix elements in S are non-zero. Due to the microscopic reversibility, if a to 
a' is allowed, so is the reverse move from cr' to a. There are two ways to count 
the total number of moves, summing up from states with energy E or summing 
up from states with energy £". The state a has N{a, E' — E) ways to move into 
energy E' . The total number of moves to energy E' from all states with energy 
is S_E(o-)=_B ^{'^■1 ^' ~ By the reversibility requirement, we must have 

J2 N{a,E'^E)= N{a',E-E'). (24) 

E(a)=E E(a')=E' 

A microcanonical average of any quantity is defined by 

/ J E{<j)=E 
E{(j)=E 



Using this definition, the previous equation (24) is reduced to Eq. (|22|). This 
argument is first put forth by Oliveira jlTt and by Berg and Hansmann psf . 
Clearly, the result docs not depend on what type of moves we use, so long as it 
satisfies the reversibility condition. 



2.5 TTT identity 

The detailed balance condition imposes a restriction on the transition matrix, 
which we call the TTT identity. Consider three distinct energy levels, i?, E' ^ 
and E" , for which energy transition matrix elements among them are nonzero. 
Let us write down three equations associated with the transitions among them: 

h{E)T{E E') = h{E')T{E' E), (26) 
h{E')T{E' ^ E") = h{E")T{E" ^ E'), (27) 
h{E")T(E" -> E) = h{E)T{E E"). (28) 

Multiplying the left and right sides of the three equations together and assuming 
that the product h{E)h{E')h{E") is nonzero, we can cancel this factor from the 
equation and obtain: 

T{E E')T{E' E")T{E" -> E) = T{E' -> E)T{E E")T{E" E'). 

(29) 
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This is the TTT identity. The importance of this equation is that it does not 
require the knowledge of the stationary distribution to check for agreement of 
the data with the condition of detailed balance. While for normal Monte Carlo 
simulation, the detailed balance is built-in directly to the transition matrix W , 
this is not the case for some of the transition matrix Monte Carlo schemes that 
have been proposed. One implication of detailed balance violation is that the 
microcanonical property that all states with the same energy have the same 
probability is violated. This detailed balance violation for the initial choice of 
Oliveira's broad histogram dynamics, a particular choice of the transition rate 

has been demonstrated explicitly for small systems |]lT| . 

The significance of the TTT identity is that given the probability h of energy 
having value E, if we can predict h" at energy E" by the detailed balance 
equations in two ways, one directly from E to E" ^ one by two hops, from E to 
E' , and then E' to E" , then the TTT identity guarantees that the results are 
exactly the same. That is, 

and 

^, T{E' ^ E") , TjE ^ E') 

^ T{E"^E'y ^ =^T{E'^Ey 

The TTT identity says that the two predictions based on the detailed balance 
are equal, h" — h" . 

Detailed balance implies TTT identity. Is the reverse also true? I.e., given 
a complete set of TTT identities, do they imply detailed balance in the sense of 
equation h{E)T{E E') = h{E')T{E' E) for all E and E'l The answer is 
yes. The TTT identity turns out to guarantee a consistent (detailed balance) 
solution involving three jumps, say, E to £", to E" , to E'" , versus E to E'" 
directly when such jumps are allowed. Therefore, TTTT identities that follow 
from detailed balance are automatically fulfilled and neither provide further 
information, nor require separate proof. Naturally, when we consider adding 
more complex Monte Carlo moves, either to improve the efficiency of a calcula- 
tion or to reflect the nature of a more complex model, more TTT identities are 
generated and identities with more factors of T are automatically satisfied. 

We define a quantity to measure the detailed balance violation for three 
energy levels for which the transitions among them are nonzero as 



^ T{E E')T{E' ^ E")T{E" E) 



T(E' ~* E)T{E ^ E")T{E" E') 



(32) 



where T(- • •) is Monte Carlo estimate of T(- • •). For a single-spin-flip dynamics 
with the Metropolis rate, the energy transition matrix is given by Eq. (po|), with 
a{E E') = min(l, /(i?')//(£')), thus the above equation is equivalent to 



^ {Nia, E' - E))E(Nia', E" - E')) e' (Nia" , E - E"))i 



{N{a', E - E'))e' {N{a, E" - E))e{N{(t'\ E' - E"))i 



(33) 
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For Ising model where energies are equally spaced, we consider three levels at 
E, E + AJ and E + 8J, where J is the coupling constant. A plot of v for Ising 
model is presented in ref. |Q . 

2.6 Flat histogram dynamics 

If a configuration has probability P{cr) — f {E{a)'^ / Z , then the histogram is 
H{E) cx n{E)f{E). A flat histogram is obtained if we take f{E) cx l/n{E). 
A single spin flip with the flip rate inm(l, n{E) /n{E')^ can be used to do the 
simulation. Lee's reformulation of multi-canonical method p9| ] is essentially 
this. The trick there is to determine n{E) efficiently [ p6[ . 

From the equation describing detailed balance for the transition matrix, , 
we can write the acceptance rate as 



a{E^E')=mu,(l,^^^-l^] 
^ ' V 'Too{E^E')J 



(34) 



This is the first equation derived for flat histogram dynamics, although we 
will show below that it is not unique. This rate is first proposed in ref. ]Tl| , 
and is independently discovered by Li pTf . Unlike the quantity n{E), a good 
approximation is already available in the very beginning, since we can use the 
instantaneous value N{a, E' — E) as a. preliminary estimate for Too{E E'). A 
cumulative average of contributions to Too{E — > E') can be used as a convenient, 
and remarkably accurate, approximation for the microcanonical average. We 
shall discuss how good this scheme is in a later section. 

There is another equivalent way of looking at the problem. Consider the 
energy detailed balance equation in the form 

h{E)Too{E E')a{E E') = h{E')T^{E' E)a{E' E). (35) 

If we require that the histogram is a constant, h{E) = h{E') = const, then the 
spin-flip rate must satisfy the following equation, 

a{E ^ E') ^ T^jE' ^ E) 

a{E'^E) T^iE^E'Y ^ ' 

Clearly, Eq. (|4|) satisfies the above equation. Moreover, there is a whole family 
of choices of the transition rates. Some of the choices are given in Table 0. 



2.7 N-fold way 

The standard Metropolis algorithm contains two steps. First, a move is pro- 
posed. Next, this move is accepted with probability a{E E') or rejected with 
probability 1 — a{E £"), where < a{E E') < 1. Is it possible to always 
accept a move without sampling the same configuration repeatedly? The an- 
swer is yes, if we are willing to keep extra information during the simulation. In 
the flat histogram method, this extra information is already there for free. It is 
precisely N{a, AE). 
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In an iV-fold way simulation (also known as event-driven simulation) p8{ , 
we do not change the dynamics; it is fully equivalent to the usual single-spin-flip 
dynamics. However, there is a substantial improvement in efficiency in those 
situations for which the rejection rate is high. The method begins by computing 
the total probability that a move would be accepted with the standard approach. 
For the Ising model, this probability is 



^ 1 

i=l 

where the factor is due to the fact that each spin located at site i is picked 
up with probability 1/iV. The notation ct* refers to a configuration with the 
spin at site i reversed in sign. The quantity A is the probability that any spin 
is flipped. Since the flip rate depends on the initial and final energies only, we 
can simplify the above equation as 

^ = E^^^«(^-^ + Ai.). (38) 

AE 

We divide the possible moves into classes according to their energy increment 
AE. Within a given class, each spin has the same flipping probability. We now 
set the probability of the class with energy change AE being chosen as 

P{AE) ^ ^N{a,AE)a{E E + AE). (39) 

A spin in this class is then chosen at random and flipped with probability one. 
As a practical consideration in designing algorithms, we note that the condition 
that a{E E') must be between and 1 can now be relaxed because of the 
normalization by 1 /A in this equation. 

In the original algorithm |Q , Monte Carlo time is rescaled to make the dy- 
namics equivalent to that of the usual algorithm. For the purpose of calculating 
averages, we will reweight each configuration to achieve the same effect. Each 
configuration a in the original single spin flip has equal weight. The Monte 
Carlo average of a quantity Q is computed as 

{Q) = -T.Q- (40) 

t=i 

Here Qt at step t and the subsequent steps could be the same, due to the 
possibility of rejecting a move. The probability that a move is rejected is i? = 
1 ~ A. Thus the life-time of a configuration has the probability distribution 

Pt^ {I- R)R*^\ t=l,2,3, (41) 

The average life-time for configuration a is then 

oo 

t = YtPt = = -. (42) 

^ 1- R A ^ ' 

t=i 
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In the A''-fold-way simulation, each step generates a distinct state from the 
immediate preceding one; each of these states is supposed to last for a during 
of i, on average. Thus in replacing Eq. ( ^0| ) the correct formula for statistical 
average with A'^-fold-way simulation is 

m „ / ™ 1 

t=l ' t=l 

Naively the computation of N{a, AE) seems to require 0{N) basic steps at 
each single-spin-flip move. But the effect of changing configuration by a flip is 
local, involving only the site in question and its neighboring sites. We only need 
to compute the changes in N{a, AE). Thus each move takes 0(1) in computer 
time. It is few times slower than a corresponding straightforward singlc-spin-flip 
program. The iV-fold way does require extra memory, as a list of spins for each 
class is required in order to be able to pick a spin from the class with a computer 
time of 0(1). 



2.8 Equal-hit algorithms 

The flat histogram ensemble in some sense is the best ensemble to evaluate 
the density of states, for each energy level is sampled with the same frequency. 
However, as we have seen in the A^-fold way, this is not entirely true, as some 
conflgurations are weighted more than others. The equal-hit algorithms |Q 
generate fresh conflgurations with equal probability at each energy. There is a 
very interesting aspect of these algorithms — the histograms in such algorithms 
are not unique and depend on the details of the dynamics. 

The energy histogram in the normal single-spin-flip dynamics (that is, not 
in the A^-fold way) is computed as 

HE) = (44) 

i.e., the contribution to the histogram from the conflguration cr of energy -B(cr) 
is 1 for E = E{<t) and zero for other energies. The angular brackets refer to 
Monte Carlo average. In the iV-fold way, the statistics are weighted with 1/A 
to get the equivalent result in the original dynamics, so we have 

The angular brackets indicate simple arithmetic average over the samples gen- 
erated in an A^-fold way, c.f. Eq. (4^). We put a subscript N to emphasize the 
fact that the average is over iV-fold way samples. We deflne the hits as 

uiE) - {SE(a),E)N. (46) 

This quantity measures the average number of fresh conflgurations generated 
at each energy, since in the A^-fold way, each conflguration a in the sample is 
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distinct from the previous sample due to the fact that there is no rejection in 
TV-fold way. We can relate the histogram to the hits by 

h{E) = u{E){1/A)em I (1M>N, (47) 

where 



(1M>^,N = ^ I ^ '"^ = (.A)t (48) 

is the average of inverse acceptance rate for states of fixed energy in the A^-fold 
way samples. Note that this is not the same as the microcanonical average of 
1/A, which in general is 

\Q{o))e —prTTi • (49) 

Substituting Eq. ( ^7| ) into energy detailed balance equation, (|35|), we have 

u{E){\IA)e,^T^{E E')a{E E') - 

u{E'){1/A)e'A{E' ^ E)a{E' ^ E). (50) 

Equal-hit is realized if the flip rate satisfies the above equation with u{E) = 
u{E') = const. For example, 

(T, ■ (^ {^IA)e'A{E'^E) \ 

^^^-'^^= il/A)EME^E') ) ■ ^''^ 

Other choices are also possible; some of them are given in Table 

We note that since the transition rate depends on the underlying dynamics 
through A, there is no guarantee that the histogram h{E) is unique. In fact, 
in the equal-hit dynamics, h(E) is not known, and has to be determined self- 
consistently through the equal-hit dynamics. We note that A is defined by 
Eq.(p7|) using a, which uses A in Eq. (|l|). There is also a peculiarity that h{E) 
diverges for some choices of the transition rates for the ground states. 



3 Numerical Tests 

In this section, we evaluate the performance of various proposed algorithms. In 
Table |l|, we list a dozen possible choices of flip rates, a{E E'). For each rate, 
the simulation can be implemented with or without the iV-fold way. 

As the flip rates require the knowledge of exact microcanonical average, 
which is not available, we use the crucial approximation 

^ m 

TUE -^E + AE)^ J2 SEi.n,EN{a\AE), (52) 

where energy histogram H{E) is the number of samples generated for a given 
energy, H{E) ~ J2tLi ^E{at),Ej ™ is the total number of samples generated so 
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Figure 1: Energy histogram H{E) and number of hits U{E) in A^-fold-way algo- 
rithms apphed to the 8-state ferromagnetic Potts model on a 16 x 16 lattice with 
10*" Monte Carlo steps. The top part is the standard flat histogram algorithm 0; 
the bottom part is the equal- hit algorithm 1. The vertical scales are arbitrary. 

far, and ct* is the configuration at step t in the algorithm. We collect a sample 
after every attempt of moves. The above expression is suitable for the normal 
single-spin-flip algorithm. In iV-fold way, the statistics have to be weighted by 
1/A{a). Whenever information is not available, we set the flip rate to 1. This 
biases the system to visit unexplored energy levels. 

In the above method, the simulation is started automatically, without an 
iteration process. This bootstrap is efficient and is also correct in the sense that 
it converges. We shall call the above method iteration 0. Strictly speaking, 
iteration need not be a valid Monte Carlo algorithm as the transition rates 
are fluctuating quantities, thus the normal Markov chain theory for convergence 
does not apply. However, numerical results do support convergence, although a 
rigorous proof is lacking. 

3.1 Histograms 

Figure ^ presents energy and hit histograms for a 16 x 16 square lattice for 
an 8-state ferromagnetic Potts model. As expected, the energy histogram is 
flat for the flat histogram algorithm. The hits are roughly proportional to the 
number of different configurations generated. This is not exactly true since in 
the A^-fold way, we can only guarantee that the next configuration is different 
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Figure 2: Correlation times defined by Eq. (^3|) for the Ising model versus lattice 
linear size L {— N^^'^). Diamonds are for one-dimensional chain with algorithm 
and iV-fold way; solid circles are for square lattice with algorithm —1; open 
squares are for square lattice with algorithm 0; triangles are for cubic lattice 
with algorithm and TV-fold way. 

from the immediate preceding one. The new configuration can be the same 
with configurations in earlier steps. By requiring that the hits are equal for all 
energies, we obtain the equal-hit algorithm with the corresponding energy and 
hit distribution shown in the lower part of the figure. 

Due to the statistical nature of the histogram and also due to the fact that 
the energy range is explored similarly to a random walk, the histogram is not 
exactly flat, but fluctuates around a mean. During a simulation of length t of 
Monte Carlo steps (sweeps), we have generated t/r independent samples, where 
T is correlation time related to the histogram. These samples are distributed 
to order N bins of different energies (for the two-dimensional Ising model, it is 
exactly — 1 bins). Thus, each bin has about t/{TN) independent samples. 
The relative fluctuation of the histogram is (asymptotically for large t) 

Although the above argument applies for the fluctuation between different runs, 
it is also reasonable to apply it to the fluctuation among different energies, since 
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multi-canonical 





min (1, a/b) 


6.43 


original flat histogram 


1 
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standard equal-hit 
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aU 


9.5 
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l/{bV) 




^-1/6 glow convergence 
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^-1/3 giQ^ convergence 
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7.0 
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1/6 


7.2 




7 


a/ {a + b) 


6.4 
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{a + b)/b 


9 
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N/N{a,E' - E) 




definitely not converge 


10 


U/b 


8.1 


^-1/5 glow convergence 


11 


a/V 




12 


UifaU < bV, else aU/b 
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Table 1: A list of choices of the flip rates and their errors in transition matrix 
with respect to the exact results on a 5 x 5 square lattice for the Ising model. 
In the formula, we have a = ^{N{a',E - E'))e', b = j^{N{a,E' - E))e, 
U = (l/A(a'))_E',N, V = {1/ A{(j)) E,N, where E is the current energy and E' is 
proposed new energy. In simulation, the exact microcanonical average (■••)£; is 
replaced by a cumulative average. 



the samples between different energies are assumed independent. The same 
analysis also applies to the hits in equal-hit algorithm. 

Equation ( ^3|) can be used as a deflnition for the correlation time r. A perfect 
Poisson process has a constant correlation time (t = 1); an ideal random walk 
in energy is r cx A'^. We compute r for the nearest neighbor Ising model on 
one-dimensional chain, two-dimensional square lattice, and three-dimensional 
cubic lattice. We found that numerically Eq. (|5^) is approximately satisfled, 
with the correlation time growing with size, r cx L in one dimension, t cx L^ "^ 
in two dimensions, and r oc L^'^ or in three dimensions, see Fig. (||). The 
algorithm becomes inefficient (slow convergence) for large lattices in two and 
three dimensions. We shall discuss this further later on. 



3.2 Rate of convergence 

We tested the zeroth iteration algorithms for convergence to the exact values for 
the infinite temperature transition matrix, Too{E — > E') — jj{N{a,E' — E))e, 
on a 5 X 5 Ising square lattice model. The exact result Too is obtained by an 
exhaustive enumeration. Figure ^ is a plot of overall error in the A^-fold-way 
simulation data T, defined by 
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Figure 3: Average errors in the calculated transition matrix Too on a 5 x 5 Ising 
model in A^-fold-way simulation. The numbers correspond to various choices of 
flip rates given in Table 

versus Monte Carlo simulation length t (averaged over many runs). The Monte 
Carlo times are in units of sweeps {N moves). The asymptotic value erVt for 
large t is listed in Table |l], which characterizes the rate of convergence. 

As expected, the best algorithm is algorithm number —1, when the input 
n{E) is exact. The algorithms (or rates) number 0, 1, and 7 are only slightly 
worse than the best. Numbers 5 and 6 are the second best. The rate numbers 
3, 4, and 11 may still converge extremely slowly or not converge at all. We do 
not understand why this is so. Number 9 does not converge, which we know 
since it does not satisfy detailed balance equation. Using instantaneous values 
rather than average values is clearly wrong. 

Errors for single-spin-flip algorithm are about 1.5 larger than with the A^- 
fold-way, or about 1.5^ « 2.3 less efficient in Monte Carlo steps. However, since 
the A^-fold way is slower in CPU time than single-spin flips by a factor of 2 or 
more, the two methods are comparable in overall efficiency in this particular 
instance. 

From the above results we conclude that even the zeroth iteration converges 
to the correct results. This does not mean that the rate of convergence is uniform 
in E. In fact, we found for large lattices, the violation of detailed balance, v, is 
large at the two ends of the energy spectrum [n2| . 
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3.3 Determining the density of states 

There are a number of different ways of determining the density of states. The 
matrix T^{E E') has eigenvalue 1 with corresponding left eigenvector n{E). 
However, the solutions of the eigenvalue problem are numerically unstable. The 
broad histogram equation, (p2|), can be used. In the simplest application, we can 
ignore the extra equations, and consider only these with smallest Ai? ^ E' — E, 
and obtain solution from iteration, 

\nniE') = \nniE) + ln ^^||,^^j . (55) 

Since there are more equations of this type than the unknown n{E), we can use 
least-squares method. Our experience suggests that we should view the problem 
as an optimization with nonlinear constraints. 

There are two possible models for an optimization solution. Introducing the 
optimization variable S{E) = hin{E), consider. 



subject to any known constraints. For the d-dimensional Ising model, we have 
S{-E) = S{E), S{-dJN) =ln2, ^ exp(S'(£;)) = 2^. (57) 

E 

The three conditions are the symmetry between low and high energies, the 
degeneracy for the ground states, and the total number of states. The weight 
(J^{E, E') is the variance of the Monte Carlo estimates of the quantity hi{Too{E 
E')/Toa{E' £')). The above minimization problem is essentially a linear 
problem. We solve it by an iterative method. 

The second, different formulation of the optimization is expressed in variable 
T^iE^E'), 

minimize ^ (Too{E ^ E') ~ f^{E ^ E')f , (58) 

E,E' '^E,E' ^ ' 

where Too{E E') is Monte Carlo estimate with error as.E', and T^{E E') 
is unknown. The minimization is subject to the conditions 

0<T^{E^ E')<\, ^Too(i;^S') = 1, 

E' 

\{t^{E^E') = 1[tUE'~^E). (59) 



The last one is symbolically a TTT identity, see Eq. ( ^91 ) for a concrete example. 
For the Ising model, there is also an additional symmetry relation 

T^{E^E + AE)=T^{-E^-E-AE). (60) 
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Figure 4: Average relative error in density of states e„ as a function of Monte 
Carlo step t. Open circles are for algorithm 0; solid triangles are for a two-stage 
algorithm (algorithm followed by algorithm —1). For each set, the system 
sizes are 4 x 4, 8 x 8, 16 x 16, and 32 x 32, from bottom to top. 



This set of constraints is much more difhcult to handle. Intuitively, this second 
optimization problem should give better result than the first one. However, this 
is not the case, at least for the two-dimensional Ising model. For this model, 
optimization in Too(- • •), Eq. (58), gives twice the error (e^ defined below) of 
the first optimization method. A simple iteration with Eq. ( |55| ) and AE = ±4 J 
has 4 times the error comparing to optimizing solution of Eq. (|5^ ) and (^7|). 

Figure ^ is another convergence test plot for algorithm and a two-stage 
simulation, both with A'^-fold way, on the two-dimensional Ising square lattices. 
In this plot, we consider the relative error per energy level for density of states. 



N 



HE) 



n{E) 



- 1 



\S{E)-S{E)\ 



(61) 



The normalization by — 1 is to exclude ground state energy and its symmetric 
state energy, for which the exact values are imposed. The exact value n{E) is 
obtained according to ref. ||2^; n{E) is Monte Carlo estimate obtained from 
solving Eq. ( |5^ ) with aE,E' — 1- 

We see signs of slower convergence for large lattice sizes for algorithm 0. A 
two-stage simulation improves the efficiency to that of using the exact density 
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L 



Figure 5: Tunneling time dependences on system sizes for three-dimensional 
± J spin glasses. The diamonds are from single-spin flip (without iV-fold way) 
with algorithm 0; the circles are A^-fold-way algorithm 0; and the triangles are 
A^- fold- way algorithm 1. 

of states in flip rates. We first apply the algorithm using cumulative average 
in the flip rates; we then apply algorithm —1 (Lee's method of multi-canonical 
algorithm), using the density of states obtained in the first step. Both steps 
use the same number of Monte Carlo sweeps. With the two-stage algorithm, 
the slowing down is roughly r„ oc L^'^, using a similar definition as given in 
Eq. (||. 

3.4 Dynamic characteristics of the algorithms 

We have already presented two correlation times — r characterizes the conver- 
gence of the histogram and t„ characterizes the convergence of the density of 
states. Both of them showed the effect of reduced performance when the size 
is increased. By definition of the correlation times r and r„, they also mea- 
sure roughly how many Monte Carlo steps are needed to generate independent 
samples. Of course, in r„, it also reflects the effect of data analysis methods. 

Another measure of the performance of algorithms is the tunneling time. In 
our study, we define the tunneling time as the average Monte Carlo steps that 
the system in the lowest energy state goes to the highest energy state, or vice 
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versa. More precisely, as soon as a ground state is reached, we record the current 
Monte Carlo move ti, and then look for the highest energy, which may happen 
at t2- We then look for the ground state again. The difference (^2 — ti)/N 
consists of one sample for the tunneling time. 

The tunneling time for the two-dimensional Ising model is very well described 
by Tf K, 0.4L^® (in units of sweeps). An ideal random walk in the space of energy 
would have an exponent of 2 (jt (x N in general). The dynamics is close but 
not quite random walk in energy. 

The spin glass is one of the most difficult systems to simulate. The perfor- 
mance of the flat histogram algorithm for the two-dimensional ± J spin glass is 
presented in ref. [3C|] . In Fig. ^ we show the tunneling time as a function of 
system linear size L for three algorithms (algorithm and 1 with iV-fold way, 
and algorithm without A^-fold way) on the three-dimensional Ising spin glass. 
The gain from A'^-fold way as comparing to standard single-spin-flip is by a con- 
stant factor. The equal-hit algorithm is about a factor of 6 faster in tunneling 
times than the algorithm without iV-fold way. Unfortunately, this gain is not 
very significant as the iV-fold-way program runs few times slower than standard 
single-spin flips. The slowing down exponent is about 8, this is comparable to 
that of multi-canonical method . 

4 Energy Transition Matrix Dynamics 

The stochastic matrix W describes Monte Carlo dynamics in the space of spin 
configurations. Such state space is extremely large, containing 2^ states, from 
which Monte Carlo moves sample only a very small fraction. On the other hand, 
we introduced a new stochastic matrix T in a coarse-grained space of energy. 
The matrix W and T are related by Eq. (p^. We shall call the dynamics induced 
by this stochastic matrix energy transition matrix dynamics. In discrete time 
step i, the dynamics describes the evolution of the histogram hi{E) as 

h,+i{E) = HE')TiE' ^ E). (62) 

E' 

What is the significance of this dynamics? The dynamics describes the change 
of energy distribution through the following single-spin-flip moves: given the 
current state a with energy E, pick a new state a' with the same energy E 
among all the n{E) degenerate spin states with equal probability, flip a spin 
according to the usual spin flip rate as embedded in W. As we can see, since 
the state changes at random to a completely new state of the same energy to 
try another flip, its dynamics is substantially faster than single spin flip or even 
cluster flip. Unfortunately, such dynamics is not realizable on a computer, but 
it is of interest for comparison with realizable algorithms. 

We can say more about the dynamics given by Eq. (|2|). Let us first convert 
the equation into continuous time which is more convenient for analytic treat- 
ment, and which is a valid description for moderately large system. Introducing 
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t = i/N and M=l/N and define h{E,i/N) = hi{E), we have 

h (e, '-^) -h(E,^)=J2 HE', ^/N) (t{E' ^ E) - 5e^^ . (63) 



Dividing both side by taking the limit of large N , we have 
dh{E,t) 



Y,HE',t)f{E' ^E), (64) 



9i 

where the continuous time transition matrix T is related to the discrete step 
matrix by f{E ^ E') = {T{E E') - Se,e')N. 

Two results were initially reported in ref. [ p^ . Detailed derivations will 
be given in Appendices. Firstly, an explicit form of T can be given for the 
one-dimensional Ising model with Glauber flip rate, as 

f(fc-fc-l) = + (65) 

(N - 2k){N - 2k - 1) , , , , 

T{k^k + 1) = ^ 2(]V^T^ -i^-l)^ (66) 

where k = {E/J + N)/4, 7 = tanh(2 J/(fcsT)) , and N is the chain length. The 
diagonal term is computed from the relation 

f (fc fc- 1) +f (fc ^ fc) +f (fc -> fc + 1) = 0, (67) 

and the rest of the elements T{k — > fc') = if \k — k'\ > 1. Secondly, equa- 
tion (|6^) is continuous in time but still discrete in energy. We can go one step 
further to consider the limit of both time and energy to be continuous. For tran- 
sition matrix associated with canonical ensemble, we found a partial differential 
equation in such limit as 

dh{x',t') _ d fdh{x',t') 



(68) 



dt' dx' 

where x' and t' are properly scaled energy fluctuation and time: 
and t' = At with 

A= \im J2f{E^E), (70) 

E 

where itg is the average energy per spin and c = k^T^c is the reduced specific 
heat per spin. This equation, (68), is equivalent to the one-dimensional quantum 
harmonic oscillator equation, thus the analytic solutions are readily obtained. 

The most important consequence of this equation is that the relaxation time 
is proportional to the specific heat of the system. This result can also be seen 
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from a less rigorous point of view. Since this artificial dynamics involves a 
random walk on the probability distribution of the energy, the characteristic 
relaxation time will be proportional to the square of the energy fluctuation, 
{E — (_B))^y which is in turn proportional to the specific heat. 



5 Connections with Other Methods 
5.1 Single histogram method 

In the single histogram method j^, one performs a canonical ensemble simula- 
tion at a fixed temperature To, and collects the histogram H{E). The histogram 
is proportional to n{E)exp{—E/kBTo), so an estimate to the density of states 
is obtained from 

n{E)^H{E)exp(-^]. (71) 



Once we have the density of states, we can use it to evaluate thermodynamic 
quantities at any temperature. 

Unfortunately, since H{E) is approximately a Gaussian function with mean 
{E)to and variance ksTQcN, where c is the specific heat per spin, the accuracy of 
the estimates deteriorates exponentially outside the energy window of order \/N, 
c.f. Fig. Detailed error analyses for energy and specific heat are given in 

The region of good accuracy coincides with the critical region at a second 
order phase transition, so the single histogram method is still an extremely 
valuable tool to study phase transitions. 

The transition matrix approach can also be used in a way similar to single 
histogram method, i.e., collecting the statistics of the transition matrix in a 
canonical simulation. Numerical comparison suggests that the two methods are 
comparable in accuracy. In fact, as we can see in Fig. for a certain interval, 
the transition matrix gives results which are up to 10 times better, but become 
comparable or worse outside the limited range of E. If we use the two results to 
compute the average energy or heat capacity, we found that the errors are about 
the same The reason is that the contributions to errors are dominated by 
the tails of the histogram distribution H{E), at these ranges, the density of 
states is of comparable accuracy. 

It is somewhat disappointing that the single histogram method and transi- 
tion matrix Monte Carlo analysis are of the same accuracy. Some improvement 



can be made by a careful analysis using Baysian method 1 34 . But it is unlikely 



that we can bring about an improvement of order v N for the accuracy. 



5.2 Multiple histogram and multi-canonical methods 

Both of the multiple histogram method and the multi-canonical method Q 
give density of states over a wide range. While the multiple histogram method 
uses a collection of standard canonical simulations, the multi-canonical method 
uses only one simulation. In reality, a multi-canonical simulation needs to be 
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Figure 6: Errors in the density of states for a 32 x 32 Ising model from a 
canonical simulation at fcsTc/J ~ 2.269, with 10^ Monte Carlo steps. The 
dotted line labeled FS is by single histogram method; the solid line labeled 
TMMC is obtained by energy transition matrix with the same simulation. The 
relative error is computed from \n{E)/n{E) — 1|, where n{E) is the exact value, 
h{E) is Monte Carlo estimate, and the error is an average over 48 simulations. 



iterated few times to converge to the desired ensemble. In this respect, the flat 
histogram method takes at most three iterations. The first iteration already 
gives excellent results, although there are noticeable biases for large systems; 
the second iteration with fixed flip rates greatly improves the accuracy; the third 
iteration would give correct sample average for the transition matrix as well as 
correct multi-canonical ensemble. 

The additional benefit of using the transition matrix is improved accuracy 
comparing to other methods, within the same simulation runs. Figure ^ shows 
the accuracy of the density of states for two-dimensional Ising model on a 16 x 16 
lattice. We note that the accuracy is sensitive to the constraints imposed with 
the optimization. This extra accuracy comes about due to the nature of the 
samples that are taken. In histogram or multi-canonical methods, each new state 
gives one count in the histogram, while N counts are collected from each state 
for the matrix. Naively, we expect an improvement by a factor of N in terms 
of the variance, since each state contributes 1 for the histogram, and each state 
contributes a number of 0{N) for the transition matrix. While the accuracy 
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E 



Figure 7: Errors of a single simulation in the density of states for a 16 x 16 
Ising model from algorithm —1 (Lee's dynamics). The Monte Carlo steps for 
the simulation are 4 x 10^. The density of states are calculated by (a) Berg's 
method, (b) transition matrix with normalization constraint n{E) — 2^ , (c) 
transition matrix with the constraint of groundstate degeneracy n(0) = 2, and 
(d) transition matrix with both constraints of (b) and (c). Due to symmetry, 
only half of the data are plotted. 

of the transition matrix elements does improve as the system size increases 
(may be by \/\fN for the error), as has been pointed out by others |35[ this 
accuracy is lost in the density of states. This is due to accumulation of errors, as 
the transition matrix elements only determine the ratio of the density of states, 
c.f. Eq. (|55|). If we use a simple iteration method starting from the ground state, 
we see that this extra accuracy in the matrix elements gets canceled exactly by 
the accumulation of errors. However, the optimization methods of determining 
n[E) make the error analysis difficult. Here are some quantitative comparisons. 
We take the exact multi-canonical rate (algorithm —1) in simulation, and collect 
both the transition matrix and the histogram. With transition matrix, the 
average relative errors of the density of states defined by Eq. (p^) for the two- 
dimensional Ising model of size L = 4, 8, 16, 32, and 50 with 10^ Monte Carlo 
steps in each run are 0.0003, 0.0012, 0.0037, 0.011, 0.024, respectively. The 
corresponding results computed by histograms are 0.0033, 0.010, 0.027, 0.058, 
0.10, respectively. In general, transition matrix method performs better than 
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histogram based methods, as we have shown numerically. 

5.3 Transition probability methods 

A proposal to use the transition matrix was given by Smith and Bruce in 
1995 in connection with multi-canonical simulation. This is further developed 
by Fitzgerald et al The canonical transition probability (CTP) method 

also estimates the transition matrix and uses energy detailed balance equa- 
tions to estimate the canonical distribution. In the simplest version, instead 
of collecting the histogram H{E), a matrix H{E E') is incremented by 1 
for every Monte Carlo move from state with energy E to state with energy E' . 
Clearly, this quantity is an estimator of h{E)T{E E'). The transition matrix 
is obtained by H{E E')/ H{E). Both of the above methods and the present 
method are similar in the way that transition matrix is used. However, there are 
two important differences: (1) in CTP method, only the current move is used 
for statistics, not all possible moves of the state; (2) the simulation is performed 
in canonical ensemble at a given temperature. 

5.4 Wang-Landau method 

Very recently, Wang and Landau proposed an intriguing method to de- 
termine the density of states. The dynamics follows the usual multi-canonical 
simulation or entropic sampling, by the single-spin- flip rate min(l, n{E)/n{E')^ . 
However, n{E) is not a constant, but is updated at each step of trial move with 

n{E) ^ n{E) f (72) 

for the current energy E. This is somewhat like the Lee's method of entropic 
sampling, but to some extent the updating of the weights are done at every 
move. If / were greater than 1, the algorithm would never converge, so the 
idea is to reduce / after some Monte Carlo steps, by / <— f^^''^, for example. 
A criterion of flatness of the histogram was used to determine if / should be 
reduced. 

Wang and Landau's idea can be adopted in the context of transition ma- 
trix. For example, we can consider updating the logarithm of density of states, 
S{E) = lnn(_E), using the information from the transition matrix by 

S{E) ^ SiE) + 77(5P'°d(S) - S{E)^ , (73) 

where < 77 < 1 is some small parameter and 

S.-(E) = -i|:(s(^', + l.^f^) (74) 

is the predicted logarithmic density of states, based on M possible hops from 
E' to the current E. If we already know the ground state degeneracy, we 
can fix it to the constant. Unlike the updating rule n{E) *— n{E)f which 
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Figure 8: Average relative error in the density of states of the two-dimensional 
Ising model for various sizes at fixed Monte Carlo steps. The two-stage algorithm 
[circles] used 2 x 10^ Monte Carlo steps (f 0^ for stage 1 with algorithm and 
10^ for stage 2 with algorithm —1, using the result oin{E) of stage 1 as input). 
The Wang and Landau method [squares] and the iV-fold way algorithm flat 
histogram [open triangles] used 10^ steps, respectively. The parameters used 
in the Wang and Landau's program are 80% flatness criterion, /□ = 2.718, 
/min = 1, and checking for flatness at every 10'^ steps. 

makes n{E) grow indefinitely, S{E) will converge to the exact value. This 
generalization does give more accurate results than algorithm if it converges 
to flat histogram. However, it appears to have the problem of sticking to a 
Gaussian-like distribution for the histogram for large systems. 

We made an extensive test of the accuracy of the random walk algorithm 
of Wang and Landau. In general, the random walk algorithm is far inferior 
to the flat histogram algorithm in terms of accuracy and rate of convergence 
to flatness, particularly for small systems. For the original implementation of 
the method, once the system passes the transient period, the error becomes 
independent of the total Monte Carlo steps used and the system sizes, and is 
primarily determined by how slowly / is reduced. This feature is useful for its 
robustness, particularly for large systems. 

In Fig. ^ we plot average errors in the density of states as defined by Eq. ( |6l| ) 
for fixed Monte Carlo steps of 10® (2 x 10® for the two-stage algorithm). It is 
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clear that all methods have bigger errors for larger systems. But the random 
walk algorithm is generally order of magnitudes worse than the best transition 
matrix based methods. The sharp increase of the error with system sizes from 
16 to 32, and to 50 for the random walk algorithm is an indication that with 10^ 
steps, the system is still in transients for these sizes. If 10^ Monte Carlo steps 
are used, the random walk algorithm comes closer to the accuracy of algorithm 
at size L = 50. The two-stage method (algorithm followed by algorithm — 1, 
both using A^-fold way) gives the best performance. 

6 Generalization and Some Comments 

The transition matrix approach can be used for continuous degrees of freedom. 
In such applications, both the dynamic variables and energy spectrum have to 
be discretized. Mufioz et al have applied the broad histogram method to the 
XY model | |4C| ] . The important issue here is how to control the numerical error 
caused by discretization. 

It is straightforward to generalize the transition matrix to more than one 
macroscopic variable, such as energy and magnetization; in fact, this has already 
been done | |4l|] with the broad histogram method. This approach may have 
problems, particularly when the Hamiltonian is complicated. First, the matrix 
may be too large to handle in general. Second, with more elements to fill, the 
statistics for individual entry is poor. This makes the method less robust. 

The transition matrix simulation can be parallelized very efficiently, where 
each processor works on separate configurations, using and updating a common 
transition matrix. Each processor can be limited to work on a range of energies. 
The advantage of this is clearly a fast way of approaching the flat distribution 
for the histogram. 

7 Conclusion 

Starting from the detailed balance equation, we have formulated the transi- 
tion matrix in energy. The infinite-temperature version of this matrix serves 
as the basic data from which we determine the density of states and at the 
same time is used for construction of flat-histogram and equal-hit algorithms. 
This method of simulation together with optimization method to determine the 
density of states offers a better way of computing thermodynamic quantities 
by Monte Carlo simulation. In such an approach, a single simulation produces 
the whole function of temperature (or other parameter of the model) through 
re-weighting. It is efficient and easy to implement. As the use of accumulated 
average for the transition rates causes slow convergence for large systems, a 
two-stage iteration is recommended and is enough to get the best convergence. 
Dynamically, the flat-histogram algorithm for long simulations is equivalent to 
the multi-canonical method. Using the equal-hit algorithm together with A^-fold 
way offers additional benefits. 
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Appendices 

A Exact expression for the energy transition ma- 
trix in one dimension 

We consider the single-spin-flip dynamics with a random pick of spins and the 
Glauber flip rate 

a{E ^ E + AE) = i 

In one dimension for the Ising model, the quantity (iV(cr, AE)) e can be evalu- 
ated exactly. Let us first define a set of new variables rii = {oiUi+i + l)/2. rii 
is 1 for a satisfied bond, and for an unsatisfied bond. The mapping from a 
to n is unique modulo an overall spin up-down symmetry. We assume periodic 
boundary condition and lattice size N to be even. There are three possible 
energy changes, — 4J, 0, -|-4J. If the original spin of a site and the spins of 
two neighboring sites all have the same sign, it contributes one count for the 
total N{a,4J). In terms of rii, it requires two consecutive satisfied bonds, i.e., 
niUi+i = 1. Thus we can express N{a, -|-4J) in terms of rij as 

UiTii+i. (A.2) 

i=l 

Note that only N — 1 variables nt, i = 1,2, . . . ,N — 1, are independent (since 
X^i^i '^i must be even). 

A microcanonical average at energy E needs to be carried out. Let us use 
k to label the equally spaced energy levels, k = 0,1,2, ... ,N/2. Then E = 
—N J + AJk, and Ui = n — N ~ 2k. The microcanonical average can be 
expressed as a summation over all rii subject to A; = {E/J + N)/A being an 
integer constant. Thus we have 

N 

n{E)(N{a, +U))e = 2 ^ ^ mm+u (A.3) 

y~] . ni=n 

and similarly 

N 

n{E){N{a,-AJ))E = 2j2 ^(1 - nO(l - n,+i). (A.4) 

y~] . ni=n 



tanh 



AE 
2kBT 



(A.1) 
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The factor 2 is due to the two-to-one mapping from cr to n. In order to com- 
pute the above sums, we consider the statistical mechanics problem of a onc- 
dimensional lattice gas with the Hamiltonian, 



N 



N 



(A.5) 



The partition function of this system (at l/ksT — 1) is 



CN ^ \ 

1=1 1=1 ) 



(A.6) 



Taking the derivative with respect to e, we have 



dZ_ 
97 



N 



N 



N 



(A.7) 



=0 "=o 



where = . The desired quantity is obtained from the generating function Z 
as {N {(j^+AJ)) E = ^^^^y^/n(i?). The partition function is obtained by 

the standard trick of transfer matrix. We find 



where 



are the eigenvalues of the matrix 



After some algebra, we find 



1 



dZ_ 
97 



N~2 



e=0 



n! (iV-2-n)! 

n— ^ ' 



Thus 



.(i.)(iV(a,+4J))(_.,,,,, = 2A- . (..^y,,;^,), 

A similar derivation from a slightly different Hamiltonian gives 

2iV(Ar - 2)! 



n(i;)(iV(a,-4J))(_jv+4fc),/ = 2A- 



(2A:-2)!(7V-2fc)!' 



(A.8) 
(A.9) 

(A.IO) 
(A.ll) 
(A.12) 
(A.13) 



Combining the above results with the density of states for the Ising model, 
n{E) — 2 A^!/[(2fc)! {N — 2k)\], which is readily obtained by the combinatorial 
problem of putting 2k unsatisfied bonds in N places, we obtain the expressions 
given in Eq. (|65|) and (|66|). 
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B Transition matrix dynamics in the continuum 
limit 



We start from the dynamical equation 
dh{E, t) 



at 



J2HE',t)fiE' ^E), (A.14) 



where time t is continuous but energy E is discrete. The aim is to consider the 
continuous energy Umit. This limit is natural and is a very good approximation 
for large systems. We follow the general method known as 17 expansion |^ . 
Let us introduce a new variable, 

(A.15) 



TV 

where uq will be determined later. Since E is of order N, naively x is of order 
VN. However, by choosing uq to be the average of E/N, we cancel the leading N 
dependence; x is actually of order 1, measuring the relative fluctuation around 
mean. We look for nontrivial solution in variable x in the scaling limit of 
TV — > cx), keeping x fixed. More precisely, we find equation in x such that 
the coefficients of the differential equation are independent of iV. Consider the 
function in terms of x as h{x) = h{Nuo + x^/N). We also write T in a; as 



T,{x) = T{Nuo + xViV + ia-^ Nuq + xVN), (A.16) 

where i = 0, ±1, ±2, • • • , zLd is the change of energy associated with E' , and 
E' = E + i a, a — 4 J. We assume a d-dimensional Ising model in the derivation. 



Replacing E by Nuo + xy/N, E' by Nuq + xvN + i a, Equation (A.14) becomes 
dh{x,t) ^ r9.f^if , *a 



at 

i=0,±l,---,±d 

The crucial step now is to take Taylor expansion assuming a/^/N small, and 
to find equation that is leading order in the large N limit. For notational 
convenient, we shall drop the tildes on T and h, which actually denote different 
functions. 

We know in the limit iV oo, h{x,t) and Ti(x) are smooth functions in x. 
Then (omitting the variable t for clarity) 

h(x + = h{x) + ^h'{x) + U ^\ ' h"{x) + 0{1/N^'^)h"'{x). 



(A.18) 

The primes denote derivatives with respect to x. We should note that in the 
large N limit with x fixed, h{x) and its derivatives do not contain A^. The N 
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dependence is made explicit by the above expansion. Substituting Eq. (A. 18) 
into Eq. (A. 17), we have 

dh{x) 



dt 



A{x)h{x) + B{x)h'{x) + C{x)h"{x) + 0{\/^/N)h"'{x), (A.19) 



where 



i 



(A.20) 
(A.21) 
(A.22) 



Naively, since T is 0(iV), we expect A{x) ~ 0{N), B{x) - 0{y/N), and C{x) ~ 
0(1) and the equation does not have a well-defined large N limit. But this is 
not true, due to special relations among Ti. Two relations are important in the 
derivation below to show that both A{x), B{x), and C{x) are of 0(1) and the 
third derivative can be dropped in the large TV limit. 
The existence of an equilibrium implies 



(i;' E) ^0. 



(A.23) 



Expressed in Ti{x), it is 



J2 TAx 

i=0,±l,---,±d 



I a 



= 0. 



Thus, using To(a;) from the above 
A{x) 



ro(x)+^T,(a;) 



T,{x)-T,{x 



E 

i#0 



(A.24) 

(A.25) 
(A.26) 
(A.27) 



The last equation above used a Taylor expansion. Since Ti{x) has a scaling 
form Ti{x) ~ N g{x / \/N) in the large N limit, we find that the fc-th derivative 
of T^{x) at X = is of order T^{Q) - 0{N'^-^/'^). Thus T[(x) is of order //V, 
and we can safely replace a; by 0, and find 



A{x)^\Y^Tm 



O 



A + O 



(A.28) 
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For B{x), we make an expansion in x for Ti{x) and find 

B{x) = Y,T,{x)^ (A.29) 

= E + 7^'(0)4^x + ii;"(0)4^x2 + . . .1 (A.30) 



where in the last formula, the first term is of order v iV, the second term is of 
order 1, and last term is of order 1/^/N. If the first term were there, we would 
have an ill-defined limit. So we must require that 

I? = ^T,(0)^=0. (A.31) 

This is in fact a condition to fix uq. We shall show that this condition requires 
uq — {E /N)t, the canonical average of energy per spin. 

We evoke the energy detailed balance equation, (|l5|). In terms of the new 
variables x and i, it is 

T,{x)h,q (x + ^^= \ = T_, (x + -^=\ heqix). (A.32) 



Let S — la/yN, Taylor expanding the terms involving 5, we find: 

r,(0) - T_,;(0) - -r,(0) [ln/ieg(0)]' 5 + + O (^^) . (A.33) 



Note that the first term is of order 0(v N), the second term of order 0(1). It is 
important to realize that we are looking for the scaling limit of — > oo, fixing 
x. 

Substituting this equation into the expression for D, we find 



D= [Tfc(0)~r_.(0)]^« fc'^fc(0)[ln/^e,(0)r^. (A.34) 



The requirement that Z? = is equivalent to say that a; = is at the extreme 
of equilibrium probability distribution. 

When the first term in B{x) is set to 0, we have 

The coefficient C{x) is a constant to leading order in N: 
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= lE{Tm^-^ + TW^^ + -] (A.37) 
- \Y.T'^i^)^^^Oi^)^C^Oi^). (A.38) 

i 

The equilibrium distribution of energy for large system is a Gaussian dis- 
tribution with mean {E)j- = Nuq, and variance Nc where c is reduced specific 
heat per spin, thus 

heq{x) = -1= cxp ( - ^ ) . (A.39) 



V 2c y 

Substituting this result into the partial differential equation in equilibrium 

dh 

— ^Ah + Axh' + Ch" ^ 0, (A.40) 
at 

we find C — Ac. This same relation can also be obtained using detailed balance 
equation. Changing variables from t to t' ~ At and from x to x' — x/\/E, we 
obtain Eq. (H). 

C Data file for errors 

The file below is the raw data for various errors. Formally this is not part of the 
paper. We included here which could be useful for future benchmarking use. 

Convergence Test Data 

MCDIS = -1, SIZE L = 5, N-Fold-Way, 2D Ising, algo -1 to 12 
e_T = sum I T - T_ex I 

mcs -1 1 2 3 4 5 6 7 8 9 10 11 12 

1 14.961 15.059 15.043 13.995 14.299 15.357 13.330 14.241 14.455 15.883 13.859 16.06 13.776 15.237 

10 6.480 8.107 7.795 6.509 6.33 8.83 5.52 5.40 6.79 9.91 6.32 9.68 6.41 8.09 

100 0.681 0.819 0.736 0.759 2.28 1.09 0.710 0.713 0.667 1.29 1.47 1.00 1.171 0.825 

1000 0.188 0.204 0.210 0.234 1.50 0.320 0.221 0.221 0.199 0.226 0.581 0.227 0.370 0.256 

le4 0.0608 0.0644 0.0663 0.080 1.00 0.14 0.070 0.070 0.064 0.076 0.539 0.079 0.256 0.094 

le5 0.0181 0.0206 0.0215 0.030 0.698 0.070 0.0219 0.0224 0.0201 0.0246 0.5353 0.026 0.2136 0.0349 

le6 0.0060 0.0064 0.0070 0.0098 0.43 0.044 0.0067 0.0077 0.0064 0.0085 0.533 0.008 0.131 0.0144 

le7 0.00186 0.0020 0.0021 0.0040 0.357 0.0172 0.0024 0.0023 0.0021 0.0030 0.5342 0.0026 0.0926 0.0047 

leS 0.00062 0.00066 0.29 0.012 0.0842 

same as above parameters, but with single-spin-flip (i.e., no N-fold) 

mcs -1 1 2 3 4 5 6 7 8 9 10 11 12 



1 


16, 


.448 


15.26 


15.263 


10 


9, 


.329 


9. 


162 


9. 


165 


100 


1, 


.286 


1. 


583 


1. 


599 


1000 


0, 


.287 


0. 


312 


0. 


310 


le4 


0, 


.093 


0. 


097 


0. 


095 


le5 





.0287 


0. 


0311 


0. 


0314 



algo no (error at N-fold mcs=le4) 
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best -1 (0.061) 

very good 0,1,7 (0.065) 

good 5,6,8 (0.07) 

OK 2,10 (0.08) 

not so good 4,12 (0.1) 

bad (converge) 3,11 (>0.25) 

don't converge 9 (0.5) 



All the rest has MCDIS=0 (no MCS discarded) 



2D Islng model, e. 


.n = (1/(N-1)) 


sum 1 n(E)/n. 


.ex(E) - 1 


1 




algo (N-f old-way) 






mcs L=4 


L=8 


L=16 


L=32 


L=50 


1 1.200 










10 0.152 


1.38- 








100 0.0340 


0.212 


1.3- 






1000 0.0109 


0.0492 


0.20 


1- 




le4 0.00342 


0.0154 


0.0568 


0.28 


0.31 


le5 0.00108 


0.0054 


0.0189 


0.084 


0.23 


le6 0.000338 


0.00154 


0.0066 


0.07 


0.22+/-0.02 


le7 0.000110 


0.00051 


0.0024 


0.042 


0.191+/-0.0138 


le8 0.000035 


0.000190 


0.0009 


0.022 


0.16 


"wide distribution, occasionally not converging 




tunnel 18 . 983 


144.68 


983 


6448 




time 










2D Ising model, e. 


.n = (1/(N-1)) 


sum 1 n(E)/n_ 


.ex(E) - 1 


1 


algo 


1 (N-f old-way, 


equal-hit) 






mcs L=4 


L=8 


L=16 


L=32 


L=50 


1 1.20 










10 0.142 


1.27 








100 0.0410 


0.178 








1000 0.0134 


0.051 


0.26 






le4 0.0042 


0.0152 


0.048 


0.18- 


0.4 


le5 0.00138 


0.0050 


0.015 


0.04 


0.12- 


le6 0.00044 










le7 0.00015 










2D Ising model, e. 


.n = (1/(N-1)) 


sum 1 n(E)/n. 


.ex(E) - 1 


1 


algo -1 


(N-f old-way) (exact rate, best we can 


do) 


mcs L=4 


L=8 


L=16 


L=32 


L=50 


1 1.32 










10 0.131 


14 








100 0.031 


0.18 








1000 0.0097 


0.038 


0.16 






le4 0.0030 


0.0125 


0.041 


0.13 


0.23 


le5 0.00096 


0.00383 


0.0121 


0.037 


0.07 


le6 0.00031 


0.00119 


0.0037 


0.0114 


0.024 


le7 0.000095 


0.00040 


0.00122 


0.00337 


0.007 


le8 0.000032 


0.000103 


0.00045 


0.00091 




Ising model, same 


as above algo 


-1 N-f old-way, but use 


histogram to 


compute n(E) , i.e 


. Berg/Lee method 






mcs L=4 


L=8 


L=16 


L=32 


L=50 


le5 0.010 


0.033 


0.0733 


0.32 


0.5 


le6 0.0033 


0.0099 


0.027 


0.058 


0.10 
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2D Ising model, e_n = (1/(N-1)) sum I ...| 
two stage simulation, stage 1, using algo 0, 

stage 2, using algo -1 of n(E) generated from stage 0. 
same run length for the two stages 



mcs L=4 L=8 L=16 L=32 L=50 

100 0.0311 0.26 

1000 0.0096 0.0404 0.10 

10000 0.0030 0.0122 0.0401 0.2 0.5 

le5 0.00100 0.00349 0.0131 0.038 0.075 

le6 0.00036 0.00123 0.0034 0.0133 0.017 

le7 0.00038 0.0016 0.0029 0.0074 

le8 0.0004 0.0010 

Modified Wang FG/Wang JS method (algo 13'), e_n = (1/(N-1) sum I ...| 
using S(E) <- S(E) + eta (S'pred - S(E)) 

(eta = 0.1) (two numbers - using Tmatrix/using S(E) directly) 

mcs L=4 L=8 L=16 L=32 



100 

1000 0.00936/0.358 

le4 0.00306/0.0087 0.39/0.96 

le5 0.00096/0.0028 0.0039/0.015 0.99/0.99 

le6 0.00115/0.004 0.71/0.99 

le7 0.00049/0.0014 

Wang Fugao's program (fixed MCS, f_max = 2.718, f_min=l, 80°/,H, check every Ik steps) 



e_n = < 


1 n(E)/n. 


.ex(E) - 1 1 > 








mcs 


L=4 


L=8 


L=16 


L=32 


L=50 


le4 


0.172 


10- 








le5 


0.065 


0.073 


1 






le6 


0.066 


0.049 


0.066 


0.36 


9(+/-4) 


le7 


0.055 


0.073 


0.043 


0.05 


0.18 


le8 


0.069 


0.047 


0.07 


0.025 


0.14 


Wang FG 


method e. 


.n (my program 


algo 13, 


807,H, checks 


30 times, single- 


mcs 


L=4 


L=8 


L=16 


L=32 


L=50 


le4 


0.079 


1.62 








le5 


0.034 


0.071 


0.7 






le6 


0.012 


0.031 


0.056 


0.42 


5.5 


le7 


0.0044 


0.010 


0.025 


0.06 


0.14 


le8 


0.0013 








0.02 



!-spin-f lip) 
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